Skip to main content

Full text of "USPTO Patents Application 09875480"

See other formats


United States Patent m 

Ichige et aL 

[54] IMAGE SIGNAL ENCODING AND 

COMMUNICATING APPARATUS USING 
MEANS FOR EXTRACTING PARTICULAR 
PORTIONS OF AN OBJECT IMAGE 

[75] Inventors: Kenji Ichig€, Chigasaki; Takuya 

Imaide, Fujisawa; Ryuji Nishimura; 
Norio Yatsuda, both of Yokohama; 
ffiroyuM Kuriyama, Tokyo; Mayuko 
Oda, Kawasaki, all of Japan 

[73] Assignee: Hitachi, Ltd., Tokyo, Japan 

[21] AppL No.: 418,688 

[22] Filed: Apr. 7, 1995 

PO] Foreign Application Priority Data 

Apr. 15, 1994 [JP] Japan 6-076863 

Jul. 7, 1994 [JP] Japan 6-155691 

[51] Int CI 6 ~ H04M 11/00; H04N 7/14 

[52] U.S- O. 348/14; 348/17; 348/51; 

379/96 

[58] Field of Search 379/53, 54, 90, 

379/93, 96, 97, 98; 348/13, 14, 17, 51 

[56] References Cited 

U.S. PATENT DOCUMENTS 

5,073,927 12/1991 Grube « 348/14 

5,426.460 6/1995 Erving et aL 348/14 

FOREIGN PATENT DOCUMENTS 

0566188 10/1993 European PaL OS. 348/17 

OTHER PUBLICATIONS 

A. N. Netravali, B.G. Haskell: "Digital picture" pp. 

115-119, AT&T Bell Lab. (1988). 

JP-A-57-129076. 




US00571059GA 

[ii] Patent Number: 5,710,590 
[45] Date of Patent: Jan. 20, 1998 



"Journal of Institute of Television Engineers of Japan", voL 

45, No. 7 (1991), pp. 793-799. 

JP-A-62-120179. 

"Systematic Image Encoding", Makoto Miyahara pp. 1-14, 
IBC 

JP-A-59-208983. 
JP-A-4-205070. 

ITU-T Recommendation H261, Video codes for audiovisial 
Series at px 64kbits (1993). 

W. F. SchreiberjTundamentals of electronic imaging 
systm", Springer-Verlag (1993) p. 106. 
**Encoding technology for television telephone and televi- 
sion conference" p. 793, Journal of Institute of Television 
Engineers of Japan, vol 47 (1991), No. 7. 
JP-A-5-27346. 
JP-A-3-22753. 
JP-A-6-225328. 

"Intelligent image processing" chapter 8 pp. 132-139, 
Shokodo (1994), Agiri and NagahashL 
JP-A-5-22753. 

Primary Examiner— Curtis Kurtz 

Assistant Examiner—Stephen W. Palan 

Attorney Agent, or Firm— Antonelli, Tory, Stout & Kraus, 

LLP. 

[57] ABSTRACT 

A picture communication apparatus includes an extracting 
circuit for extracting video data of at least one portion from 
video data inputted thereto, an encoder for respectively 
encoding the extracted video data and the remaining video 
data, and a multiplexer for multiplexing the encoded video 
data. When encoding the video data, predetermined amount 
of codes are allocated to the respective video data. This 
suppresses deterioration in the picture quality when con- 
ducting an image communication via a transmission path 
having a low transmission rate. The transmitted video is 
displayed on a contoured display. 

3 Claims, 10 Drawing Sheets 



MEMORY 



120 

A 

121 

U 



-180a 



,125 



EXTRAC- 
TION 
UNIT 



DISPLACE- 

Hment 
evaluator 



CODER 



126 



M81 



CONTROL 
UNIT 



122 

U 



,132 



SYNTHESIZE 
UNIT 



,180b 



MEMORY- 



•182 12 8 129 



MULTI- 
PLEXER 



130 



DECODER 




DEMULTI- 




PLEXER 



124 



07/26/2002, EAST Version: 1.03.0002 



U.S. Patent 



20 



21 



22 



i 



Jan. 20, 1998 Sheet 1 of 10 

FIG.1 

^ ^ 27a 

25a 



24. 



27b 



CODER 



EXTRAC- 
TION 
UNIT 



25b 



CODER 



.26 



.40 



SYNTHESIZE 
UNIT 



CONTROL 




DATA 


UNIT 




BASE 


^ ^32 


^ x31a 



DECODER 



DECODER 



'31b 



28 



MULTI- 
PLEXER 



.30 



DEMULTI- 
PLEXER 



5,710,590 



Z 

ID 
> 
HI 

o 

LU 

o 

CO 
CO 

5 

CO 

< 



,29 



23 




FIG.2 









DISPLAY 




CODEC 



NETWORK 




07/26/2002, EAST Version: 1.03.0002 



U.S. Patent 



Jan. 20, 1998 



Sheet 2 of 10 



5,710,590 



FIG.3 



41- 



40 

L 



42s 



MEMORY 



C>-43 



DECISION 
UNIT 



44- 
45- 



ADDRESS 
GENERATOR 



46 

I 



47 



MEMORY 



FIG.4 



INPUT 
IMAGE 




EXTRACTED EXTRACTED 
PORTION 1 PORTION 2 EXTRACTED 

PORTION 



EXTRACTED 



►CODE A 



EXTRACTED 
PORTION 3 



NON- 
EXTRACTED 
PORTION 




CODE B 



07/26/2002, EAST Version: 1.03.0002 



U.S. Patent 



Jan. 20, 1998 



Sheet 3 of 10 



5,710,590 



FIG.5 



CODE A 


CODE B, 


CODE A 


CODE B 2 



-FRAME- 



-TIME 



( CODE B, + CODE B 2 + + CODE B m = CODE B ) 




07/26/2002, EAST Version: 1.03.0002 



U.S. Patent 



Jan. 20, 1998 



Sheet 4 of 10 



5,710,590 



120 



i 



121 



122 



123 



i 



125 



FIG.7 

_^127 



EXTRAC- 
TION 
UNIT 







CONTROL 


UNIT 





CODER 



126 



132 



SYNTHESIZE 
UNIT 



131 



DECODER 



128 



MULTI- 
PLEXER 



130 



DEMULTI- 
PLEXER 



ui 
> 

UJ 

o 

HI 

a. 



O 

CO 
CO 

2 

CO 

z 
< 



129 



124 



FIG.8 

MEMORY MAP 



ENTIRE FACE 



EYES 



MOUTH 



NOSE 



EYEBROWS 



07/26/2002, EAST Version: 1.03.0002 



U.S. Patent 



Jan. 20, 1998 



Sheet 5 of 10 



5,710,590 



AC 
UJ 
Q 

121 

X 



CO 
Ui 



f 

O 



CO 

o 
m 

Ui 
UJ 



FIG.9 



Ui 

CO 

o 

2 



UJ 

a 
x 



CO 
IU 

> 

Ui 



X 

o 
2 



CO 

o 

QC 
CD 

Ui 



Ui 
CO 

o 
z 



UJ T" 

E uj 

UJ U. 



Ui CM 

£ ui 

UI u. 



(— N/N) 



FIG.10 



140 



141 143 



DISPLAY 





PROJECTION 




UNIT 










MEMORY 



142 



07/26/2002, EAST Version: 1.03.0002 



U.S. Patent Jan. 20, 1998 Sheet 6 of 10 5,710,590 



FIG.11A 

MODEL OF MOUTH 



UPPER 
TEETH 



LOWER 
TEETH 




UPPER 
UP 

INTERIOR OF 
MOUTH 



FIG.11B 

BASIC FORM 

UPPER 




LOWER 
LIP 



FIG.11C 

INFORMATION OF VARIATION 



INTERIOR OF 
MOUTH 




<l VERTICAL 
"OPENING 



-HORIZONTAL— H 
OPENING 



07/26/2002, EAST Version: 1.03.0002 



U.S. Patent 



Jan. 20, 1998 Sheet 7 of 10 



5,710,590 



FIG.12 



MEMORY 



120 



180a 



.125 



EXTRAC- 
TION H 
UNIT 



121 



DISPLACE- 
MENT l-H 
EVALUATOR 



CODER 



182 12 8 



129 



MULTI- 
PLEXER 



126 



181 



CONTROL 
UNIT 



122 



132 



183 



130 



SYNTHESIZE 
UNIT 



180b 



MEMORY ■ 



DECODER 




DEMULTI- 




PLEXER 



FIG.13 

MEMORY MAP 



ENTIRE FACE 



EYES 



MOUTH 



NOSE 



EYEBROWS 



UJ 

> 

LU 

o 

UJ 

cr 



CO 
CO 

CO 

z 
< 
cr 



124 




07/26/2002, EAST Version: 1.03.0002 



U.S. Patent Jan. 20, 1998 Sheet 8 of 10 5,710 

■FIG.14 



[—FRAME — | 




BASIC DISPLACEMENT 
INFORMATION INFORMAITON 
: m FRAME- 



FIG.15A 




FIG.15B 



FEATURES TO BE COMPUTED 




WIDTH, COORDINATES 

HEIGHT OF CENTER SIZE GRADIENT COLOR 
(W,H) OF MASS 




CONSTITUENT ELEMENTS 



07/26/2002, 



EAST Version: 



1.03.0002 



U.S. Patent 



Jan. 20, 1998 



Sheet 9 of 10 



5,710,590 



FIG.16 



\ PART 

\no. 
type\ 

NO. \ 


HAIR 
0 


FACE 
1 


RIGHT 
EYE 
2 


LEFT 
EYE 
3 


RIGHT 
EYEBROW 
4 


LEFT 
EYEBROW 
5 


MOUTH 
6 


— 


0 


— M 

fl 


o 

^* — 


<■£ 












1 




0 




9 






0 




2 














0 




! 




ELEMEN 


IT NO. 
I 


! 
i 











FIG.17 

FRAME FRAME 

DEMARCATION DESCRIPTION DEMARCATION DESCRIPTION 

CODE OF ELEMENTS CODE v OF ELEMENTS 




FRAME n FRAME n+1 



ELEMENT NO. . COLOR(r-y.b-y). POSITION(x.y). SIZE) 



07/26/2002, EAST Version: 1.03.0002 



U.S. Patent Jan. 20, 1998 Sheet 10 of 10 5,710, 



FIG.18 




FIG.19A FIG.19B 




MODEL IMAGE 
TRANSMITTED IMAGE 



07/26/2002, EAST Version: 1.03.0002 



5,710,590 

1 2 

IMAGE SIGNAL ENCODING AND Moreover, in a videophone apparatus and a video confer- 
COMMUNICATING APPARATUS USING ence system, a video signal produced by an imaging appa- 
MEANS FOR EXTRACTING PARTICULAR ratus is encoded to be transmitted via a transmission path 
PORTIONS OF AN OBJECT IMAGE such as an integrated services digital network (ISDN). For 
t* AmrrunirNn hp thr TNVFNTTON 5 sample, for component national television system commit- 
B ACKGROUND OF THE INVENTION tee (NTSQ signals, when the signals are not compressed in 
The present invention relates to an image encoding and transmission data, the transfer speed in terms of bits is 216 
picture communication apparatus, for example a video megabits per second (Mbps) according to the studio stan- 
phone and an image recording apparatus. dards of color television. This leads to a requirement that the 
Conventionally, as communication apparatuses to com- i 0 signals are to be more efficiently encoded to reduce the 
Tniinirate voices and tones, there has been used a video n umb er of bits of transmission data. As the encoding 
telephone facility (A. N. Netravali, B. G. Haskell, "Digital method, there has been primarily employed a method 
Picture", pp. 115-1 19, AT&T Bell Lab. (1988)). The appa- described in pages 793 to 799 of the "Journal of Institute of 
ratus includes a sending facility including an imaging Television Engineers of Japan* 1 , VoL 45, No. 7 (1991). 
apparatus, a voice input device, and an encoder circuit for 15 Namely, mere is basically used a conditional pixel supple- 
encoding images and voices; a receiving facility including a m rating method on the basis of inter-frame estimation or 
decoder for decoding signals of images and voices, a display forecasting in which only mobile portions are transmitted 
including a speaker and CRT; and a conamunication con- suc jj that other encoding methods such as a discrete cosine 
troller for cominiinicating images and voices via a network. transform (DCT) are additionally used in combination with 
In such conventional apparatuses, the contents of an image 20 the conditional pixel supplementing method. Thanks to 
produced by a camera are entirely encoded and transmitted development of such a highly efficient encoding technology, 
via a transmission line, which leads to necessity of trans- videophones and video conference systems using ISDN 
mitring a large amount of data. Consequently, a low-priced lines have been widely introduced to practical uses in 
videophone of a type conducting communication via a business and industrial fields. A communication method in 
low-speed analog communication line has been attended 25 which an image is transformed into codes for transmission 
with a problem in which the picture quality is considerably thereof has been described, for example, in the JP-A-62- 
deteriorated or motion of pictures becomes uncomfortable 120179 and in "Systematic Image Encoding" written by 
and unnatural Makoto Miyahara in pages 1 to 14 of the IPC. 

Various attempts have been made to cope with the prob- Although there have been known low-cost comniunica- 

lem above. Far example, according to a videophone appa- 30 t j on systems such as a videophone using analog 

ratus described in the JP-A-57- 129076, a background image transmission, to carry put transmission at a low transmission 

beforehand prepared is compared with a video currently rate, the picture quality has been sacrificed to a considerable 

being produced so as to accordingly clear the background, extent This consequently leads to the following problems, 

thereby achieving security control and nunirnizing the Expression appearing particularly in a human face cannot be 

amount of image information to be transmitted. 35 satisfactorily transmuted or displayed and variations in the 

However, when users of the system conduct communica- expression cannot be communicated in a realtime fashion, 

tion while viewing images of each other, the images of the resulting in an unnatural morion of the face, 

persons are most important in ordinary cases. Namely, Another object of the present invention is to provide a 

background images of the respective persons are less impor- videophone system capable of producing a high-quality 

tant in many cases. In consideration of effective allocation of 40 vide0 ^ a realtime manner even through such a 

the limited amount of codes, it can be consequently regarded transmission line having a low transmission rate as an 

as inefficient to uniformly encode the constituent elements ^ ^ thereby solving the above problem. 

of an image in an <^^^^^ n ^^. To achieve the above object according to the present 

assign the same quantity of codes to objects having different J ^ a 

v^es of szgruficance to &e commumcatmg users 45 ^ ^ of related to S a a 

^SS^** vtteophone apparato tecn^ in the ^deo camera mcluding extraction processing means for 

JP-A-57-r29<r76req^^ extracting ^ subject ^ot by mc camera and computing 

ground image in advance. Namely, conaderahon has not feamres feereof ^ mcans {ar mzlynng the 

been given to operabihty and usability of the users. fcaturcs from mc extraction processing means and convert- 

SUMMARY OF THE INVENTION ing the features into description of knowledge corresponding 

nistherefbreanobjectof the present invention to F ovide t» *c database, interface means for converting the descrip- 

a picture conmiiinication apparatus which can be used even *>n of knowledge generated by the encoding means into 

though a low-sptxrftransrnission path such as analog phone- signals cxmfonning to a signal system of a signal transmis- 

line while retaining a satisfactory quality of picture. 55 sion path and transmitting the knowledge description to a 

To achieve the object according to the present invention, r^ver and converting a signaksent from a sender into 
there are arranged image extraction means for extracting Ascription of knowledg^and decodu^ means for compos- 
images of r^ilar portions of an object, coding means for ^al according to the ^owledge description 
coding the «tracted^age portions and means for commu- * e ***** ade ^ reference to the database, 
nicating a partner with image data. 60 T^ constituent means above operate to achieve the object 

The image extraction means extracts images of particular 88 follows, 

portions of a subject Each of the extracted images is When the sender transmits an image, the video camera 
encoded in an encoding method or encoded by changing including, in addition to the extraction processing means and 
encoding parameters to produce a quantity of codes accord- encoding means, a signal processing circuit and a control 
ing to significance of the pertinent image portion. This 65 circuit commonly used for digital video cameras conducts 
optimally distributes codes to the respective portions of a signal processing known in the processing of video signals 

screen image. produced by an imaging apparatus to resultantry generate 



07/26/2002, EAST Version: 1.03.0002 



5,7: 

3 

such signals of the image as video signals. The extraction 
processing means extracts me subject from the video signal 
generated by the signal processing circuit to compute such 
features of elements of the extracted object as the size, 
contour, color, coordinates of center of mass, and gradient 
The encoding means including a micro computer or the like 
analyzes information of the features from the extraction 
processing means, recognizes elements constituting the 
object and states thereof, and transforms the recognized 
information items into knowledge description corresponding 
to the database including knowledge of models related to the 
subject The interface means transforms the knowledge 
description generated by the encoding means into signals 
conforming to a signal system of the transmission path and 
transmits the resultant signal through the transmission path. 

The signal received via the transmission path is converted 
by the interface means into knowledge description. Hie 
decoding means decodes the knowledge description to 
reconstruct the transmitted image. In this operation, the 
decoding means accesses the database keeping therein a 
large number of images of models related to the object and 
then selects therefrom video data items associated with the 
elements constituting the image sent from the sender so as 
to restore the original image. 

That is, the sender does not transmit the image itself. The 
image of an object such as a human face to be transmitted 
is beforehand transformed into knowledge description rep- 
resenting the image such that the knowledge description is 
sent as transmission data. In the receiver, the knowledge 
description is decoded into the image of the subject as 
above. 

With this provision, the amount of transmission data is 
remarkably minimized and hence it is possible to construct 
a videophone system capable of communicating high- 
quality pictures in a realtime manner even through such a 
communication line having a low transmission rate as an 
analog telephone line. 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and other objects and advantages of the present 
invention will become apparent by reference to the follow- 
ing description and accompanying drawings wherein: 

FIG. 1 is a diagram showing a first embodiment of an 
image communicating apparatus according to the present 
invention; 

FIG. 2 is a diagram showing the overall configuration of 
an image communicating apparatus according to the present 
invention; 

FIG. 3 is a diagram showing constitution of an image 
extracting circuit of the first embodiment; 

FIG. 4 is a diagram for explaining an encoding method of 
the first embodiment; 

FIG. 5 is a diagram for explaining the encoding method 
of the first embodiment; 

FIG. 6 is a diagram showing a second ernbodiment 
according to the present invention; 

FIG. 7 is a diagram showing structure of encoding means 
of the second embodiment; 

FIG. 8 is a diagram showing a memory map of video data 
in a storage; 

FIG. 9 is a diagram for explaining an encoding method of 
the second embodiment; 

FIG. 10 is a diagram showing constitution of a display of 
the second embodiment; 



L0,590 

4 

FIGS. 11A to 11C are diagrams for explaining an image 
model of human mouth; 

FIG. 12 is a diagram showing a third ernbodiment accord- 
ing to the present invention; 
5 FIG. 13 is a diagram showing a memory map of video 
data in a memory; 

FIG. 14 is a diagram showing the encoding method of the 
third embodiment; 
io FIGS. 15A and 15B are diagrams for explaining an 
example of the method of converting an image into knowl- 
edge description; 

FIG. 16 is a diagram showing an example of the contents 
of a database; 

13 FIG. 17 is a diagram showing an example of knowledge 
description; 

FIG. 18 is a diagram showing an image on the receiver 
side immediately after the communication line is established 
between the sender and the receiver, and 
20 FIGS. 19A and 19B are diagrams for explaining a method 
of receiving an image. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

25 

Next, description will be given of an embodiment of an 
picture communication apparatus according to the present 
invention. 

FIG. 2 shows structure of a picture communication appa- 

30 ratus such as a videophone facility including a user 1 
conducting communication via the apparatus, a video input 
apparatus 2, voice input device (microphone) 3, a display 
device 4, codec 5, and a communication network 6. 
Hie user 1 of the communicating apparatus communicates 

35 via a communication network with a partner using a similar 
communicating apparatus at a remote place. The imaging 
device 2 shoots an image of the user 1 and then inputs a 
video signal of the image to the encoder 5. The microphone 
3 transforms the voice of the user 1 into a voice signal to be 

40 fed to the encoder 5. The encoder 5 encodes the video and 
voice signals into a signal of code (communication signal) 
conforming to the network 6 and then supplies the signal to 
the network 6. In addition to transmission of the communi- 
cation signal from (he user 1 to the network 6, the decoder 

43 6 conducts reception of a communication signal sent from 
the communicating partner 1 via the network 6 and then 
decodes the signal to restore the video and voice signals of 
the partner L The resultant video and voice signals of the 
communicating partner 1 are fed to the display 4 to be 

so presented as an image and a sound. 

FIG. 1 shows an example of structure of the encoding 
device 5 of FIG. 2. A including an input terminal 20, output 
terminals 21, 22, an input and output terminal 23, an input 
signal before an extracting process 25a, an input signal after 

55 the extracting process 256, encoding circuits 27a, 276, a 
multiplexer 28, a transmission 1 receive unit 29, demulti- 
plexer 30, decorder 31a, 31fr, and a signal synthesize unit 32. 
According to the present invention, voices are processed in 
the ordinary known method and hence description thereof 

GO will be avoided. The video signal of the user 1 produced 
from the imaging apparatus and microphone is received via 
the input terminal 20. The signal is encoded by encoding 
means on the sender side including the extracting circuit 24, 
encoder circuits 27a and 27b, and multiplexer 28. The 

65 encoded signal is transformed by the sending and receiving 
section 29 into a communication signal to be outputted via 
the input and output terminal 15 to the network. The sending 



07/26/2002, EAST Version: 1.03.0002 



5,710,590 

5 6 

and receiving section 14 conducts the transmission and which these regions are overlapped with each other, thereby 
reception at die same time and receives via the input and finally extracting as the region of the object an area sur- 
output terminal 23 a communication signal containing an rounding the overlapped area- 
image and a voice from the communicating partner. The According to the data items from the extraction circuit 24, 
signal is decoded by decoding means including the separat- 5 the control circuit 26 delivers a control signal from the 
ing circuit 30, decoding circuits 31c and 31fc, and compos- output terminal 27 to regulate the direction or orientation 
ing circuit 32 to restore the image signal of the partner. The and ratio of magnification of the imaging apparatus 2. As a 
image signal is then delivered from the output terminal 22. result, the imaging apparatus 2 is desirably and automati- 
The video signal is sent to the display 4 to be represented as cally oriented to the user 1 to shoot an image having an 
the image of the partner. Although not shown, when the l0 appropriate size. The processing procedure is executed as 
image of the user 1 is to be displayed on the display 4 for necessary so that the imaging apparatus 2 automatically 
confirmation, it is only necessary mat a change-over opera- follows movement of the commimicating person in front 
tion is conducted in the sending and receiving section 29 to thereof. To adjust the orientation and magnification ratio of 
treat the transmission signal as a reception signal. the imaging apparatus 2, the apparatus 2 may be mechani- 
Alternatively, the input video signal need only be supplied l5 cally or electronically operated. In an imaging apparatus 
to the composing circuit 32 to be mixed with a received including imaging devices such as charge-coupled devices, 
image so as to compose an image to be presented on the the electronic adjustment of orientation above can be 
display 4. achieved by using CCDs including marginal pixels which 

On receiving the signal from the extracting circuit 24, the are other than those used for the output of the imaging 
control circuit 26 sends a control signal to the imaging 20 apparatus. In addition, the magnification ratio can be elec- 
apparatus to obtain an optimal input image. The input image tronically conducted by an operation generally called elcc- 
signal is first fed to the extracting circuit 24 to extract partial tronic zooming. In the encoding circuit 27b, for the decoding 
images of the object In this embodiment, the shooting operation to be achieved later, the size and position of the 
object is the user of the apparatus. The partial images include extracted image are encoded together with the extracted 
the eyes, mouth, etc of the user. Since the images of the eyes 2s image 

and mouth vary in contour thereof more frequently than In operation on the receiver side, the received signals are 
those of the other elements of the object, it is necessary to separated by the separating circuit 30 into codes of extracted 
allocate a larger quantity of information items thereto. The portions and those of the other portions. The separated codes 
extracted partial images (extraction signal 252?) and other are respectively decoded by the decoders 21a and 31b 
partial images (non-extraction signal 25a) are inputted 30 corresponding to the encoders I7a and Z7b y respectively, 
respectively to the encoders 27a and 21b for the encoding There are resultandy attained images of the extracted por- 
thereof. Although the encoding method is not limited, to tions and images of the non-extracted portions. These 
restore a picture of a higher quality for the extracted partial images are fed to the composing circuit 32 to produce an 
images, a greater number of codes are generated from the image according to information items of the sizes and 
encoder circuit 21b. For the encoder circuit 27a, there may 35 positions of the extracted images, 
be utilized any encoding methods ordinarily used for video- FIG. 3 shows an example of constitution of the extracting 
phones (reference is made to ITU-T Recommendation H. circuit 24 of FIG. 1. The extracting method is basically 
261, Video codes for audiovisual services at px64 kbits identical to that described in the JP-A-4-205070. In the 
(1993) and to "encoding technology for videophone and configuration^ reference numerals 40 and 43 designate input 
television conference" written in page 793 of the Journal of 40 terminals, a numeral 41 denotes memory means including 
Institute of Television Engineers of Japan, VoL 47 (1991), one-bit data for each input pixel and keeping therein results 
No. 7). The encoder 27b may be operated according to the of decision for extraction areas, a numeral 45 indicates 
encoding method such as an entropy encoding method memory means, a numeral 42 stands for a decision circuit, 
(reference is made to page 106 of "Fundamentals of Elec- a numeral 44 represents an address generating circuit, and 
tronic Imaging System" written by W. F. Schreiber and 45 numerals 46 and 47 designates output terminals. A video 
published from Springer-Verlag in 1993). signal is fed via the input terminal 40 to the decision circuit 

The multiplexer 28 multiplexes signals of codes produced 42. The input terminal 43 is employed to input therefrom an 
from the encoders 27a and 276 in the preceding stage and extracting condition for each extraction portion. In this 
sends the multiplexed signal to the sender and receiver situatiori, it is aUowed to specify levels of me luminance and 
section 29. The extracting circuit 24 conducts, in addition to so chroma signals as the extraction condition. A plurality of 
extraction of partial images, an operation to compute for condition items are set for each extraction portion. For 
each extracted portion the size, contour, and position of a example, for the portion of the mouth, a plurality of com- 
reference point or coordinates of center of mass of the binations of luminance and chroma signal levels are set as a 
extracted portion and then outputs the resultant data items to red portion of the lip and a white portion of teeth. Since the 
the controller 26. ss lip color alters between persons, the luminance and chroma 

To sense an object and to obtain features thereof, there signal levels have allowance ranges, respectively. The deci- 
may be adopted, for example, a method described in the sion circuit 42 decides image areas conforming to the 
JP-A-59-208983 in which features of an object is attained extracting conditions in an image received from the input 
from differences between images sampled at a fixed interval terminal 40. The memory means 41 stores therein the results 
of time. Alternatively, there may be utilized a method 60 of processing of the decision circuit 4, namely, extraction 
described in the JP-A-4-205070 in which portions of a video image areas for each frame. Furthermore, the memory 
signal satisfying a preset condition, for example, a condition means 41 stores the extraction image areas for each extrac- 
determlned according to a luminance signal and a color tion portion. The results of decision are inputted again to the 
difference signal is regarded as candidates of the object The decision circuit 42 to be utilized as a candidate of an 
current candidate region thus extracted is compared with a 65 extraction region for the next frame. That is, the decision 
region of the object obtained before a predetermined period circuit 42 produces an image region by slightly expanding 
of time and stored in storage means to deterrnine an area in the previous extraction area for each extraction portion kept 



07/26/2002, EAST Version: 1.03.0002 



5,710,590 

7 8 

in the memory means 41 to make decision for each extrac- of the respective constituent components of the face. An 
tion portion in the produced region. The decision circuit 42 example of the solid image display has been described, for 

computes the size and position of the image for each example, in JP-A-5-27346 and JP-A-3-22753. In a flat-plane 

extraction portion to deliver the results from the output or two-dimensional display, it is only necessary that the sizes 

terminal 46. The data items of the size and position are s of the respective components approximately reflect those of 

employed to control the imaging apparatus. According to the * e actua ' components of the object in the imaging operation 

control operation, the image of the user's face can be created thereof. However, in a three-dimensional display, the posi- 

in a fixed contour The positional data item of each extrac- ? ons <* * e components are required to cmrectty reflect 

tion portion is delivered to the address generating circuit 44 ftose rf ^ conmonents of the send dispfcy. Fortius 

to generate an address in the memory means 41, thereby to P^fP 0 * 6 ,' P 05 ^ 003 , of ^ <*» ■»* ««* fe « 

storing the extraction portions in separate locations, respec- to fit ***** *» resultantly decide the sizes of 

lively. HG. 4 shows an example of the results of address me elements in an automatic fashion. In this 

generation in which video data of extraction portions are co/inecbon, extracting circuit 125 computes positional 

combined with each other to configure one frame such as a relationships between the elements to attain the positions of 

OF (Common Interchange Format) for transmission. Image is ^ eves and dmas & * e Acting operation. In the 

data stored in the memory means 45 is later read therefrom P°f^°f adjustment^ the conttol circuit 105 is operated 

to be delivered from the output terminal 47. according to the positions obtained by the extracting circuit 

_ . . t . , , .. , . ...... 125 to adjust the imaging position and magnification ratio on 

FIG. 7 shows ameUiod of rnritiple^g twj^ds of video theil ^ sl ^ tas ^T ilea ^ s ^ woa:ssisappr> 

dafc .temsinch^ngdjose of toe exacted porbon (code A) priilt ^ r ^r out by between the verticaland 

and non-extracted portion (code B). The muhiplexmg of * hori20ntal f ^ b £ s<ycallttleitclIQoic ZOQm _ 

T± T 1°*^ tJt w *? ingfunction.The image of the face is regulated on the sender 

extracted portion, there is transnntted thelatest data for each side to ^tch the solid image display 90; thereafter, the 

frame; whereas, fortfae non-extracted portion one image a constituent components are extracted. 

transmitted in an interval of a predetermined number of , .. . . . . . . 

frames. Moreover, since the exacted portion includes a 25 ^^f^ circuit 125 is configured m the fame way 
partial image, when the image is decoded later by the f for mat . of &e P^^g embodiment shown in FIG. 3. 
j j .l . . - . jr c . t_ / . However, in case where the extracted face includes a phi- 
decoder, there is required information of the reproducing _ . \ , ; "r mv*u«wo « yxu 
rw*„...«ti„ *k. a :„„u.a~ „i« i^lity of constituent elements or variable number of con- 

Can f q "? f*>** A 11101114165 3150 P 05 * 00 * 1 stituent elements, it is difficult to combine the extracted 

information of each extracted image. , ' ^ [ " " " \_ , 7~ 

6 30 elements to match the format of the transmission frame as 

According to the embodiment, a greater quantity of codes described above (FIG. 4). Consequently, there will be intro- 

can be allocated to such images having a larger amount of duced another layout of the memory means 45 as shown in 

information as images related to the mouth and eyes. mG 8 This includes the respective constituent elements 

Resultantly, the quantity of overall transmission data nec- sirnp i y ^ mcm ory blocks of the predetermined sizes. The 

essary for achieving a^tisf actary quality of image can be ^ itcms outouttcd from ^ extracting circuit 125 can be 

reduced or an image of a higher quality can be obtained attained by issuing a read command to the memory means 

without increasing the amount of transmission data. 45, NamelVj me obtained data items constituting an image of 

FIG. 6 shows an alternative embodiment according to the each extracted portion, 

present invention. When compared with the preceding The encoder 127 encodes the output from the extracting 

embodiment, this embodiment includes a solid image dis- w circuit 125. The encoding method or various parameters 

play of the human head in place of the display of the use d in the encoding of data are determined, as described in 

preceding embodiment. A reference numeral 90 stands far conjunction with the preceding embodiment, according to 

the solid image display and a numeral 91 indicates an the kind and the priority levels for each extracted image, 

encoding apparatus. Each of me encoded images is multiplexed by the multi- 

FIG. 7 shows constitution of an encoder circuit 12 includ- 45 plexer 128 according to the priority level thereof, 

ing an input terminal 120, output terminals 121, 122, an FIG. 9 shows an example of the multiplexing process, 

input terminal 123, an input and output terminal 124, an Each frame includes a header code field containing, for 

extraction processing circuit 125, a control circuit 126, an example, information indicating an internal format of the 

encoding circuit 127, a multiplexer circuit 128, a sender and perdu cut frame and a frame identifier (ID) to identify the 

receiver section 129, a separating circuit 130, a decoder 50 frame and a video data field of each extracted portion. Video 

circuit 131, and a composing circuit 132. Functions of the data items of the respective constituent elements are distrib- 

respective circuit blocks are the same as the corresponding uted according to the priority levels so that the related 

constituent components of the preceding enibodiment The portions of a predetermined number of frames constitute 

extracting circuit 125 extracts the elements of the face and information of one frame. According to the distribution 

the entire face to present the human face on the display 90. 55 method of FIG. 9, only the portions of the face having a 

Assume that the constituent elements to be extracted are the lower priority level are distributed into a predetermined 

overall face, eyes, mouth, nose, eyebrows, etc. The elements number of frames for transmission thereof. The method of 

are assigned with priority levels for the encoding operation distributing codes into a plurality of frames requires a data 

thereof. For example, variation in the image of the entire buffering operation in the multiplexer 128. 

face is less than that in the image of the mouth and hence the ^ The multiplexed codes are transmitted via the sender and 

entire face is assigned with a lower priority leveL The eyes receiver section 129 to the network, on the receiver side, the 

and mouth are equally important in this regard and accord- codes are processed primarily by the demultiplexer circuit 

ingly assigned with the same priority level so as to allocate 130, decoder circuit 131, and synthesize unit 131 The 

a larger quantity of codes thereto. demultiplexer circuit 121 separates codes for each priority 

When it is impossible to vary the shape of solid image 65 level from the multiplexed codes from the sender side. The 

display according to the face, the display 90 is subjected to separated codes are respectively decoded by the decoder 131 

a model of a face having average features to fix the positions to reconstruct images of the respective constituent elements. 



07/26/2002, EAST Version: 1.03.0002 



5,710,590 

9 10 

Since the frame frequency varies between video data items tion of variation or deformation thereof (variation 

received for the respective priority levels, the decoder 131 information). For the mouth image, the basic information 

includes memory means for updating video data in the includes lip image data as shown in FIG. 11B and the 

memory for each constituent element Hie internal memory variation information can be specified by the (jpening show- 
means may be configured in the niembry format of the 5 ^ ^^^* e U PP? m * ^ ^ 

~~rl. J . . & . o o^oir ~->a f™™ tfe*> as shown in FIG. 11C Video data of a variation of tbe mouth 

extracting circuit shown m FIG. 8. Signals read from the ~^ 7* . . . 

tL « ,„~a „, .icnXfrn^ ^^ri«- can be reconstructed by modifying the basic information 

memory means are used as output signals rrom tne decoder . . * . _ J % 

TT'^T^ according to the variation informaUon. Similar processing 

arcuit 131, also applies to the other extracted portions. 

On receiving the video output from the decoder 131, the An image supplied via the input terminal 120 is fed to the 

synthesize unit 132 composes an image of each constituent 10 extraction circuit 125 and undergoes an extracting operation, 

element Since the positions of the respective elements fiasic information obtained as a result of extraction is stored 

conform to information from the display 90, it is unneces- in toe memory means 180a and extracted images changing 

sary for the sender side to transmit positional information m a ^^^5 manner are supplied to the displacement 

together with the video data However, if the format for the calculating circuit 18L The basic information for mouth 

display 90 is unique, the display 90 need not transmit even 15 portion can be decided in two methods. In a first method, a 

the information above. point of time to acquire basic information is specified by (he 

FIG. 10 shows structure of the solid image display 90 user. In a second method, basic information is obtained by 

including a display unit thereof 140, a projecting unit 141 for the apparatus. In an example of the mouth, information 

projecting a picture onto the display unit 140, memory ^ related to an image of the mouth in an ordinarily closed state 

means 142, and an input and output (I/O) terminal 143. The assumed as basic information, whereas magnitude of 

memory means 142 stores therein data related to formats of variation thereof is used as variation information. In the first 

the display unit 140. The data includes data representing method, a point of time when the mouth image in the closed 

three-dimensional positions and sizes of the constituent state is obtained is determined by the user. In the second 

elements of the face. Since these data items are not changed method, an image of only the mouth is attained by an 

if the shape of the display is not variable, there is only extracting operation. Magnitude of opening of the mouth is 

required a read-only memory fixed for the display. Infer- monitored after the communication is started or during a 

mation of formats is sent via the I/O terminal 143 to the fixed period of time beginning at a predetermined point of 

encoding apparatus such that the encoder supplies the dis- time so as to decide a point of time when the magnitude 

play with an image conforming to the display. As above, if ^ takes a minimum value. This point of time is assumed to be 

information is communicated between the display and the when the mouth is closed, thereby attaining the basic infbr- 

encoding apparatus, it is possible to employ a display of mation. 

another type. The basic information for mouth portion attained by either 
When an image communication system includes the solid one of these methods is compared with the extracted image 
image display as above, there is obtained, in addition to the 35 at the specified point of time by the displacement calculating 
effect of the preceding embodiment, an advantageous fea- circuit 181 to obtain information of displacement The 
ture that the communication partner is reproduced in the encoding circuit 182 receives as inputs thereto the basic 
vicinity of the user in a three-dimensional manner. As a information and variation or displacement information and 
result, the communication can be achieved as if the partner then encodes the information. For each extracted image, the 
were in front of the user of the apparatus. Moreover, ^ obtained codes respectively of basic and variation informa- 
according to the embodiment, only the image of the human tion items are nwltiplexed by tbe multiplexer 128 to be 
face is transmitted, namely, the background image Is not transmitted via the sender and receiver section 129. 
included in the transmission data. Furthermore, the quantity fig. 13 shows the memory format of data items in the 
of codes are allocated for each portion according to the memory means 180a. Hie format may be similar to that of 
priority level or significance level thereof. Consequently, 45 me memory means in the extraction circuit 125. 
Mg^uahty pictures can be transmitted even through a nG 14 shows mc multiplexed data format employed by 
transmission path of a low transmission rate, the multiplexer 128. Basic information is trarisrnitted for 
FIG. 12 shows an alternative embodiment according to each pr edetermined number of frames. A frame not contain- 
the present invention. The diagram specifically shows con- m g the basic information is used to send variation informa- 
stitution of the encoding apparatus in which the same ^ tion. Each of the basic and variation information items 
constituent components as those of the preceding embodi- includes data items of the respective extraction portions, 
meats are assigned with the same reference numerals. The mc data reception, received codes are disassembled by 
apparatus of FIG. 12 includes memory means 180a, 1805, a ^ e demultiplexer circuit 130 into codes of the respective 
displa ce ment evaluatar 131, an encoder circuit 182, and a extraction blocks. Moreover, the codes are classified into 
decoder circuit 183. 55 those of basic information and those of displacement infer- 
In mis embodiment, in addition to extract partial images mation. Each unit of separated information is decoded into 
of the face, mere is conducted an operation to encode data of basic or displacement information by the decoder 
information related to structure of each portion of the face. 183. The basic and displacement information items are then 
As described above, the human face includes a plurality of sent to the memory means 1805 and composing circuit 132, 
portions and each portion has its own structure. FIG. 11 60 respectively. The composing circuit 132 reads the basic 
shows an example of structure of the human face. information from the memory means 1805 to execute an 
The image of the mouth section is considered to include operation of transforming the basic information according to 
the upper lip, lower lip, upper teeth, lower teeth, and interior the displacement information to reproduce each extraction 
of mouth as shown in FIG. 11A. These images do not portion and then arranges the respective extraction portions 
basically vary for a person during communication thereof. 65 at the pertinent positions to compose an image. The com- 
Consequently, information thereof can be classified into posed image is delivered as an output image from the output 
basic image information (basic information) and informa- terrninal 122. 



07/26/2002, EAST Version: 1.03.0002 



5,710,590 

11 12 

According to the embodiment described above, each tion of a constituent element changed prior to data 

extraction portion is disassembled into basic information transmission, the amount of transmission data can be much 

including basic image data and displacement information more decreased. 

including displacement data relative to the basic information To restore the original image from the knowledge 

so as to transmit the resultant codes. The basic information 5 description, images corresponding to element numbers of 

including a larger number of codes is not transmitted at each the knowledge description are read from the database to be 

frame. Namely, the basic information is transmitted at an combined with each other so as to compose the objective 

interval of a predetermined number of frames, whereas the image. When arranging each constituent elements on the 

displacement information including a lower number of screen, the position described as (0,0) for the element in the 

codes is contained in each frame to be transmitted. This to knowledge description is aligned at the central position of 

remarkably decreases the quantity of transmission codes. the screen. As described above, since the position indicates 

Next, description will be given of a process in which an the difference between the coordinates of center of mass of 

image attained by an imaging apparatus is transformed into the object and that of each element, position (0,0) stands for 

knowledge description for transmission and received video the center of mass of the object With this provision, there 

data including knowledge description is converted into an 13 can be achieved correction of positions so that the object 

original image by reference to a database containing knowl- continuously stands at the central posldon of the screen in 

edge description data. any situations. 

Specifically, when an image extracted by the extracting In the direction of depth in the screen, the respective 

circuit 24 is encoded by the encoder Z7b f the database 40 is images are presented with such a positional relationship that 

referenced to transform the extracted image into knowledge 20 the smaller items are arranged in the upper layers. Moreover, 

description. when colors of images of constituent elements such as the 

Furthermore, when receiving image data in the form of ^ and iris of each eye in the database are replaced with 

knowledge description, the decoder 316 accesses the data- expressed by the knowledge description, the restored 

base according to the knowledge description to thereby ^ image will become more similar to the original image on the 

decode the video data into the original image. In this sender side. 

operation, video data items corresponding to the respective As above, the image itself is not used as the transmission 
elements constituting the image transmitted from the sender data. The image of a transmission object (such as a human 
side are selectively read from the database including a face) is transformed into knowledge description represent- 
multiplicity of images of models associated with objects to ^ ing the image so as to send data of the knowledge descrip- 
be imaged. The selected video data items are combined with tion to the communicating partner. On the receiver side, the 
each other to restore the original video image. Next, descrip- original image of object is restored according to the received 
tion wul be specifically given of knowledge description. For knowledge description. In consequence, the amount of trans- 
methods of describing knowledge, reference is to be made, mission data is considerably rninimized and there can be 
for example, to Chapter 8 (pages 132 to 139) of "Intelligent M provided a videophone system capable of producing a 
Image Processing" written by Agui and Nagasaki and pub- high-quality picture in a realtime fashion even through such 
lished from Shokodo in 1994. a conimuni cation line having a low transfer rate as an analog 

An example of the method of transforming an image of a telephone line, 
human into knowledge description will be described by Additionally, it may also be possible in the data commu- 
ref erence to FIGS. 15A and 15B. FIG. 15 A shows an image 40 ni cation that important elements of the object are transmitted 
of an object obtained when shooting a person by an imaging in the form of knowledge description and the other elements 
apparatus. From this image, an image related to the person are transferred as video signals. In this operation, the knowl- 
is extracted to be disassembled into constituent elements edge description is transmitted in a realtime manner, 
such as the hairs, face, eyes, mouth, and body so as to obtain whereas image information of the overall screen is trans- 
features including the coordinates of center of mass, width, 45 mitted at a low transfer speed in the range of the transfer rate 
height, size, and color of each element There are also of the communication path. When transmitting, for example, 
acquired such features as the width the iris of each eye, the an image of a human face, images of the eyes and mouth 
width and height of the interior of mouth, and gradient important for communication are sent in a realtime fashion, 
values of eyes and eyebrows. These features are transform Furthermore, when the image of the object shot by the 
into data items respectively assigned with element numbers 50 imaging apparatus is extracted from the overall image of 
in association with the database as shown in FIG. 16. object by the extracting circuit and the images of the 

FIG. 17 shows an example of knowledge description- For re mainin g portions are replaced with one color, the trans- 
each element, one set of knowledge description items is mission data can be more efficiently compressed, 
specified in the form of (element number, color(r-y,b-y), However, since information of the entire screen is trans- 
position(Ax^y), size). In this expression, position(Ax^y) 55 mitted at a low transmission speed in the method above, only 
indicates the discrepancy between the coordinates of center the eyes and mouth are displayed on the screen as shown in 
of mass of the pertinent object and that of each element As FIG. 18 immediately after the communication line is estab- 
can be seen from FIG. 17, data items of knowledge descrip- lished. To overcome this difficulty, there may be prepared a 
tion of constituent elements of object are described imme- model image of the human head portion in the database 1. 
diatdy after a frame demarcation code. Assume that the 60 Immediately after the communication line becomes 
object includes, far example, ten constituent elements and available, the eyes and mouth are composed according to the 
each element such as the element number is represented by knowledge description received in a realtime manner such 
an eight-bit data item. The amount of data required for each that the images of eyes and mouth are combined with the 
frame resultantly becomes 480 bits. As above, the volume of model image so as to display the composed image on the 
transmission data can be remarkably reduced by converting 65 screen as shown in FIG. 19 A. As can be seen from FIG. 19B, 
an image into knowledge description. In addition, when the when the model image is replaced thereafter with images 
system is configured to transmit only the knowledge descrip- sequentially received from the sender side, there is continu- 



07/26/2002, EAST Version: 1.03.0002 



5,71 

13 

ously displayed a natural image even Immediately after the 
communication line is connected. Namely, the presented 
image is gradually changed from the model image into the 
human image of the sender without causing any undesirable 
artificial expression, and hence the viewer can obtain a 
naturally reproduced image. 

As above, even when there is utilized a communication 
line of a low transmission rate such as an analog telephone 
line, the ftlffmwits of expression and the like of a human face 
essential for communication can be transmitted in a realtime 
fashion while transferring video data of the overall screen 
image. This leads to an advantageous effect similar to that of 
the embodiment shown in FIG. 1. 

While the present invention has been described with 
reference to the particular illustrative embodiments, it is not 
to be restricted by those embodiments but only by the 
appended claims. It is to be appreciated mat those skilled in 
the art can change or modify the embodiments without 
departing from the scope and spirit of the present invention. 

We claim: 

1. A picture communication apparatus, comp r i sing : 
imaging means; 
voice input means; 

extracting means for extracting at least one portion of an 
image of a subject from an image produced by said 
imaging means; 

encoding means for respectively encoding the image 
portion extracted by said extracting means and a voice 
inputted by said voice input means; 

communicating means for coinmunicating via a commu- 
nication network data obtained by encoding the image 
portion and the voice by said encoding means; 

decoding means for decoding data received from said 
communicating means and thereby restoring the 
extracted image portion and the voice; 



0,590 

14 

synthesizing means far composing an image; 

a display having a surface including depressions and 
projections for displaying the image composed by said 
synthesizing means; 

a memory far storing information representing three- 
dimensional positions and sizes of constituent elements 
of the image to be displayed on the depressions and 
projections of said display; 
10 data input/output means for transferring said information 
from said memory to said synthesizing means, whereby 
said synthesizing means synthesizes the image accord- 
ing to the information received from said data input/ 
output means and the extracted image portion decoded 
15 by the decoding means to produce data representing the 
synthesized image which is coordinated to the depres- 
sions and projections of said display in accordance with 
said three-dimensional positions and sizes; and 

projection means responsive to said synthesized image 
20 data for projecting a synthesized Image onto said 
display. 

2. A picture communication apparatus according to claim 
1, wherein 

the image of the extracted portion is a part of a human 
25 face; and 

the depressions and the projections in the surface of the 
display have a general contour similar to a human face. 

3. A picture communication apparatus according to claim 
1, wherein the image of me extracted portion is apart of the 

30 human face of a user of a picture communication apparatus; 
and- 

the information stored in said memory represents the 
remaining non-extracted portion of the human face of 
35 said user. 

* ♦ * * * 



07/26/2002, EAST Version: 1.03.0002